Gyeonggi-do
- Asia > South Korea > Gyeonggi-do > Suwon (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Text-Printed Image: Bridging the Image-Text Modality Gap for Text-centric Training of Large Vision-Language Models
Yamabe, Shojiro, Waseda, Futa, Shiono, Daiki, Takahashi, Tsubasa
Recent large vision-language models (LVLMs) have been applied to diverse VQA tasks. However, achieving practical performance typically requires task-specific fine-tuning with large numbers of image-text pairs, which are costly to collect. In this work, we study text-centric training, a setting where only textual descriptions are available and no real images are provided, as a paradigm for low-cost data scaling. Unlike images, whose collection is often restricted by privacy constraints and scarcity in niche domains, text is widely available. Moreover, text is easily editable, enabling automatic diversification and expansion with LLMs at minimal human effort. While this offers clear advantages over image collection in terms of scalability and cost, training on raw text without images still yields limited gains on VQA tasks because of the image-text modality gap. To address this issue, we propose a Text-Printed Image (TPI), which generates synthetic images by directly rendering the given textual description on a plain white canvas. This simple rendering projects text into the image modality and can be integrated into arbitrary existing LVLM training pipelines at low cost. Moreover, TPI preserves the semantics of the text, whereas text-to-image models often fail to do. Across four models and seven benchmarks, our systematic experiments show that TPI enables more effective text-centric training than synthetic images generated by a diffusion model. We further explore TPI as a low-cost data-augmentation strategy and demonstrate its practical utility. Overall, our findings highlight the significant potential of text-centric training and, more broadly, chart a path toward fully automated data generation for LVLMs.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Asia > South Korea > Gyeonggi-do > Suwon (0.04)
- Asia > Middle East > Jordan (0.04)
- (2 more...)
- Information Technology (0.46)
- Health & Medicine (0.46)
A CNN-Based Technique to Assist Layout-to-Generator Conversion for Analog Circuits
Jeong, Sungyu, Kim, Minsu, Kim, Byungsub
We propose a technique to assist in converting a reference layout of an analog circuit into the procedural layout generator by efficiently reusing available generators for sub-cell creation. The proposed convolutional neural network (CNN) model automatically detects sub-cells that can be generated by available generator scripts in the library, and suggests using them in the hierarchically correct places of the generator software. In experiments, the CNN model examined sub-cells of a high-speed wireline receiver that has a total of 4,885 sub-cell instances including different 145 sub-cell designs. The CNN model classified the sub-cell instances into 51 generatable and one not-generatable classes. One not-generatable class indicates that no available generator can generate the classified sub-cell. The CNN model achieved 99.3% precision in examining the 145 different sub-cell designs. The CNN model greatly reduced the examination time to 18 seconds from 88 minutes required in manual examination. Also, the proposed CNN model could correctly classify unfamiliar sub-cells that are very different from the training dataset.
- Asia > South Korea > Gyeongsangbuk-do > Pohang (0.05)
- North America > United States > Oregon > Washington County > Hillsboro (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
- Workflow (0.94)
- Research Report (0.64)
Probabilistic Wildfire Spread Prediction Using an Autoregressive Conditional Generative Adversarial Network
Climate change has intensified the frequency and severity of wildfires, making rapid and accurate prediction of fire spread essential for effective mitigation and response. Physics-based simulators such as FARSITE offer high-fidelity predictions but are computationally intensive, limiting their applicability in real-time decision-making, while existing deep learning models often yield overly smooth predictions that fail to capture the complex, nonlinear dynamics of wildfire propagation. This study proposes an autoregressive conditional generative adversarial network (CGAN) for probabilistic wildfire spread prediction. By formulating the prediction task as an autoregressive problem, the model learns sequential state transitions, ensuring long-term prediction stability. Experimental results demonstrate that the proposed CGAN-based model outperforms conventional deep learning models in both overall predictive accuracy and boundary delineation of fire perimeters. These results demonstrate that adversarial learning allows the model to capture the strong nonlinearity and uncertainty of wildfire spread, instead of simply fitting the pixel average. Furthermore, the autoregressive framework facilitates systematic temporal forecasting of wildfire evolution. The proposed CGAN-based autoregressive framework enhances both the accuracy and physical interpretability of wildfire spread prediction, offering a promising foundation for time-sensitive response and evacuation planning.
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- Asia > South Korea > Gyeonggi-do > Suwon (0.04)
- South America > Brazil (0.04)
- (6 more...)
Length-MAX Tokenizer for Language Models
We introduce a new tokenizer for language models that minimizes the average tokens per character, thereby reducing the number of tokens needed to represent text during training and to generate text during inference. Our method, which we refer to as the Length-MAX tokenizer, obtains its vocabulary by casting a length-weighted objective maximization as a graph partitioning problem and developing a greedy approximation algorithm. On FineWeb and diverse domains, it yields 14--18\% fewer tokens than Byte Pair Encoding (BPE) across vocabulary sizes from 10K to 50K, and the reduction is 13.0\% when the size is 64K. Training GPT-2 models at 124M, 355M, and 1.3B parameters from scratch with five runs each shows 18.5\%, 17.2\%, and 18.5\% fewer steps, respectively, to reach a fixed validation loss, and 13.7\%, 12.7\%, and 13.7\% lower inference latency, together with a 16\% throughput gain at 124M, while consistently improving on downstream tasks including reducing LAMBADA perplexity by 11.7\% and enhancing HellaSwag accuracy by 4.3\%. Moreover, the Length-MAX tokenizer achieves 99.62\% vocabulary coverage and the out-of-vocabulary rate remains low at 0.12\% on test sets. These results demonstrate that optimizing for average token length, rather than frequency alone, offers an effective approach to more efficient language modeling without sacrificing -- and often improving -- downstream performance. The tokenizer is compatible with production systems and reduces embedding and KV-cache memory by 18\% at inference.
- North America > United States > Pennsylvania (0.04)
- North America > United States > New York (0.04)
- Asia > South Korea > Gyeonggi-do > Suwon (0.04)
- Research Report > Experimental Study (0.94)
- Research Report > New Finding (0.88)
Q-Learning-Based Time-Critical Data Aggregation Scheduling in IoT
Vo, Van-Vi, Nguyen, Tien-Dung, Le, Duc-Tai, Choo, Hyunseung
Time-critical data aggregation in Internet of Things (IoT) networks demands efficient, collision-free scheduling to minimize latency for applications like smart cities and industrial automation. Traditional heuristic methods, with two-phase tree construction and scheduling, often suffer from high computational overhead and suboptimal delays due to their static nature. To address this, we propose a novel Q-learning framework that unifies aggregation tree construction and scheduling, modeling the process as a Markov Decision Process (MDP) with hashed states for scalability. By leveraging a reward function that promotes large, interference-free batch transmissions, our approach dynamically learns optimal scheduling policies. Simulations on static networks with up to 300 nodes demonstrate up to 10.87% lower latency compared to a state-of-the-art heuristic algorithm, highlighting its robustness for delay-sensitive IoT applications. This framework enables timely insights in IoT environments, paving the way for scalable, low-latency data aggregation.
- Telecommunications (0.47)
- Energy (0.46)
- Information Technology (0.35)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > South Korea > Gyeonggi-do > Suwon (0.04)
Data-driven Prediction of Species-Specific Plant Responses to Spectral-Shifting Films from Leaf Phenotypic and Photosynthetic Traits
Kang, Jun Hyeun, Son, Jung Eek, Ahn, Tae In
The application of spectral-shifting films in greenhouses to shift green light to red light has shown variable growth responses across crop species. However, the yield enhancement of crops under altered light quality is related to the collective effects of the specific biophysical characteristics of each species. Considering only one attribute of a crop has limitations in understanding the relationship between sunlight quality adjustments and crop growth performance. Therefore, this study aims to comprehensively link multiple plant phenotypic traits and daily light integral considering the physiological responses of crops to their growth outcomes under SF using artificial intelligence. Between 2021 and 2024, various leafy, fruiting, and root crops were grown in greenhouses covered with either PEF or SF, and leaf reflectance, leaf mass per area, chlorophyll content, daily light integral, and light saturation point were measured from the plants cultivated in each condition. 210 data points were collected, but there was insufficient data to train deep learning models, so a variational autoencoder was used for data augmentation. Most crop yields showed an average increase of 22.5% under SF. These data were used to train several models, including logistic regression, decision tree, random forest, XGBoost, and feedforward neural network (FFNN), aiming to binary classify whether there was a significant effect on yield with SF application. The FFNN achieved a high classification accuracy of 91.4% on a test dataset that was not used for training. This study provide insight into the complex interactions between leaf phenotypic and photosynthetic traits, environmental conditions, and solar spectral components by improving the ability to predict solar spectral shift effects using SF.
- Europe > Austria > Vienna (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > South Korea > Seoul > Seoul (0.05)
- (6 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine (1.00)
- Food & Agriculture > Agriculture (1.00)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > Canada (0.04)
- Asia > South Korea > Gyeonggi-do > Suwon (0.04)